How to do an ANOVA when your data are non

2023-05-09 10:18| 来源: 网络整理| 查看: 265

The data, as indicated by the variable names in the linked-to spreadsheet, pertains to number of organisms in 13 groups, so are some kind of count data. We could do better here if we knew some more about the data! But I do not agree with the answer by @gung that we should use some count data model, like Poisson regression. I will show graphically why I say that.

The Poisson distribution have variance equal to the mean, so a simple first analysis is to plot (empirical) variances against means. After reading the data into a data.frame (in R) I did:

> summary(dat) Number Group Min. :1.000 13 : 90 1st Qu.:3.000 1 : 76 Median :4.000 4 : 70 Mean :3.826 3 : 65 3rd Qu.:5.000 6 : 62 Max. :8.000 12 : 62 (Other):299 > s2 m plot(m, s2, ylim=c(1.5, 5.5)) > abline(0, 1, col="red")

enter image description here

The red line shows variance equal to the mean. We see that all the points are below this line, and there is not much evidence that the variance increases with the mean either. So this is not Poisson data. Also, while the variances vary, they do not vary too much (all are within a factor of two), so the usual equal-variance Anova $F$ test can probably be used, as it is reasonably robust against small variations in the variance. The classic book by Miller: "Beyond Anova" gives a factor of four as acceptable. So let us try Anova in R, with and without that assumption:

> oneway.test(Number ~ Group, data=dat, var.equal=FALSE) One-way analysis of means (not assuming equal variances) data: Number and Group F = 10.265, num df = 11.00, denom df = 265.99, p-value = 1.29e-15 > oneway.test(Number ~ Group, data=dat, var.equal=TRUE) One-way analysis of means data: Number and Group F = 9.4014, num df = 11, denom df = 712, p-value = 6.497e-16

Both give basically the same conclusion here: the null hypothesis of equality of means is rejected. Note that the $F$ ratios are very similar.

Let us also show a graphical analysis, a boxplot with "notches" showing confidence intervals for the median:

> boxplot(Number ~ Group, data=dat, notch=TRUE, col="blue", varwidth=TRUE) Warning message: In bxp(list(stats = c(1, 2, 3, 5, 7, 2, 4, 5, 6, 7, 1, 3, 5, 6, : some notches went outside hinges ('box'): maybe set notch=FALSE

We see that some of the confidence intervals do not overlap at all, again consistent with the Anova results. I would not be overly concerned with problems with the assumptions underlying Anova with this data, so the conclusion is quite robust. If wanted, it could be further analyzed with the bootstrap, but I will leave that to others. If further analysis is wanted, we would need more information about the data.

enter image description here

【本文地址】

公司简介

联系我们